NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

https://doi.org/10.3390/bdcc6010027

Kirkpatrick, Anna; Onyeze, Chidozie; Kartchner, David; Allegri, Stephen; Nakajima An, Davi; McCoy, Kevin; Davalbhakta, Evie; Mitchell, Cassie S. (March 2022, Big Data and Cognitive Computing)

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
more » « less
Full Text Available
RNAStructViz: graphical base pairing analysis

https://doi.org/10.1093/bioinformatics/btab197

Schmidt, Maxie Dion; Kirkpatrick, Anna; Heitsch, Christine (April 2021, Bioinformatics)
Gorodkin, Jan (Ed.)
Abstract Summary We present a new graphical tool for RNA secondary structure analysis. The central feature is the ability to visually compare/contrast up to three base pairing configurations for a given sequence in a compact, standardized circular arc diagram layout. This is complemented by a built-in CT-style file viewer and radial layout substructure viewer which are directly linked to the arc diagram window via the zoom selection tool. Additional functionality includes the computation of some numerical information, and the ability to export images and data for later use. This tool should be of use to researchers seeking to better understand similarities and differences between structural alternatives for an RNA sequence. Availability and implementation https://github.com/gtDMMB/RNAStructViz/wiki.
more » « less
Full Text Available
Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

https://doi.org/10.3390/mca25040067

Kirkpatrick, Anna; Patton, Kalen; Tetali, Prasad; Mitchell, Cassie (December 2020, Mathematical and Computational Applications)
null (Ed.)
Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.
more » « less
Full Text Available
The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

https://doi.org/10.1016/j.jsb.2020.107475

Poznanović, Svetlana; Barrera-Cruz, Fidel; Kirkpatrick, Anna; Ielusic, Matthew; Heitsch, Christine (April 2020, Journal of Structural Biology)

Full Text Available

Search for: All records